Making a Speech Recognizer Tolerate Non-native Speech through Gaussian Mixture Merging
نویسنده
چکیده
Practicing the spoken language is important to language learners in critical parts of the United States military. Automatic speech recognition (ASR) is a technology that promises to provide self-paced practice opportunities to language learners. We propose to improve ASR in computer assisted language learning (CALL) applications by modeling the speech behavior of language learners. ASR systems trained on native data perform poorly when used to recognize beginning language learners. The Model Merging adaptation method via a confusion matrix map makes our Arabic speech recognizers more tolerant of Anglophone students. Hidden Markov Model (HMM) phone sets are trained for English and Arabic, and then English phones are merged into the Arabic phones to make a new Arabic system. A data-driven procedure is presented for automatically mapping phones between two HMM sets. Accuracy improvements were observed when model merging was combined with other adaptation techniques. The positive results indicate that the speech patterns of non-native speakers are carried over to the new system by the mapping of phones and their weighting.
منابع مشابه
State-Dependent Phoneme-Based Model Merging for Dialectal Chinese Speech Recognition
Aiming at building a dialectal Chinese speech recognizer from a standard Chinese speech recognizer with a small amount of dialectal Chinese speech, a novel, simple but effective acoustic modeling method, named statedependent phoneme-based model merging (SDPBMM) method, is proposed and evaluated, where a tied-state of standard triphone(s) will be merged with a state of the dialectal monophone th...
متن کاملSpeech Enhancement using Laplacian Mixture Model under Signal Presence Uncertainty
In this paper an estimator for speech enhancement based on Laplacian Mixture Model has been proposed. The proposed method, estimates the complex DFT coefficients of clean speech from noisy speech using the MMSE estimator, when the clean speech DFT coefficients are supposed mixture of Laplacians and the DFT coefficients of noise are assumed zero-mean Gaussian distribution. Furthermore, the MMS...
متن کاملNonlinear Transformations of Speech Features to Compensate for Channel and Noise Effects in Speech Recognition
A speech recognizer trained and tested with speech at the same SNR typically performs well. However, situations where the recognizer is trained with clean speech and used for recognizing noisy speech are commonly encountered and generally result in greatly degraded performance or lack of robustness. The features used for speech recognition setups are typically modeled by a multivariate Gaussian...
متن کاملSpeech Enhancement Using Gaussian Mixture Models, Explicit Bayesian Estimation and Wiener Filtering
Gaussian Mixture Models (GMMs) of power spectral densities of speech and noise are used with explicit Bayesian estimations in Wiener filtering of noisy speech. No assumption is made on the nature or stationarity of the noise. No voice activity detection (VAD) or any other means is employed to estimate the input SNR. The GMM mean vectors are used to form sets of over-determined system of equatio...
متن کاملA scalable architecture for multilingual speech recognition on embedded devices
In-car infotainment and navigation devices are typical examples where speech based interfaces are successfully applied. While classical applications are monolingual, such as voice commands or monolingual destination input, the trend goes towards multilingual applications. Examples are music player control or multilingual destination input. As soon as more languages are considered the training a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004